Analysis of Valuable Clustering Techniques for Deep Web Access and Navigation
نویسندگان
چکیده
A massive amount of content is available on web but huge portion of it is still invisible. User can only access this hidden web, also called Deep web, by entering a directed query in a web search form and thus accessing the data from database which is not indexed with hyperlinks. Inability to index particular type of content and restricted storage capacity is significant factor behind the invisibleness of web content. Different clustering techniques offer a simple way to analyze large volume of non-indexed content. The major focus of research is to analyze the different clustering techniques to find more accurate and efficient method for accessing and navigating the deep web content. Analysis and comparison of Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), and Hierarchical and K-means method have been carried out and valuable factors for clustering in deep web have been identified. Keywords—Deep web; clustering; Latent Diriclet Allocation; Latent Semantic Analysis; hierarchical methods; K-means methods
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملEfficient Web Usage Mining Based on K-Medoids Clustering Technique
Web Usage Mining is the application of data mining techniques to find usage patterns from web log data, so as to grasp required patterns and serve the requirements of Web-based applications. User’s expertise on the internet may be improved by minimizing user’s web access latency. This may be done by predicting the future search page earlier and the same may be prefetched and cached. Therefore, ...
متن کاملdesigning and implementing a 3D indoor navigation web application
During the recent years, the need arises for indoor navigation systems for guidance of a client in natural hazards and fire, due to the fact that human settlements have been complicating. This research paper aims to design and implement a visual indoor navigation web application. The designed system processes CityGML data model automatically and then, extracts semantic, topologic and geometric...
متن کاملAn Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification
Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to ...
متن کاملAccess Patterns in Web Log Data: A Review
The traffic on World Wide Web is increasing rapidly and huge amount of information is generated due to users interactions with web sites. To utilize this information, identifying usage pattern of users is very important. Web Usage Mining is the application of data mining techniques to discover the useful, hidden information about the users and interesting patterns from data extracted from Web L...
متن کامل